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Descrtpti n 

[0001] The present invention relates to DMA sequences encoding novel growth/differentiation factors of the TGF-p 
family. In particular, H relates to novel DNA sequences encoding TGF-p-like proteins, to the isolation of said DNA 

5 sequences, to expression plasmids containing said DNA, to microorganisms transformed by said express! . n plasmid, 
to the production of said protein by culturing said transformant, and to pharmaceutical compositions containing said 
protein. The TGF-p family of growth factors comprising BMP, TGF, and tnhibin related proteins (Roberts and Sporn, 
Handbook of Experimental Pharmacology 95 (1990). 419-472) is of particular relevance in a wide range of medical 
treatments and applications. These factors are useful in processes relating to wound healing and tissue repair. Further- 

io more, several members of the TGF-p family are tissue inductive, especially osteoinductive, and consequently play a 
crucial role in inducing cartilage and bone development 

[0002] Vtozney, Progress in Growth Factor Research 1 (1989), 267-280 and Vale et a!.. Handbook of Experimental 
Pharmacology 95 (1990), 21 1-248 describe different growth factors such as those relating to the BMP (bone morpho- 
genetic proteins) and the tnhibin group. The members of these groups share signif icant structural similarity. The precur- 

75 sor of the protein is composed of an aminoterminal signal sequence, a propeptide and a carboxyterminal sequence of 
. about 110 amino acids, which is subsequently cleaved from the precursor and represents the mature protein. Further- 
more, their members are defined by virtue of amino acid sequence homology. The mature protein contains the most 
conserved sequences, especially seven cystein residues which are conserved among the family members. The TGF- 
p-like proteins are multifunctional, hormonal ry active growth factors. They also share related biological activities such 

20 as chemotactic attraction of cells, promoting cell differentiation and their tissue-inducing capacity, such as cartilage- and 
bone-inducing capacity. U.S. Patent No. 5,013,649 discloses DNA sequences encoding osteoinductive proteins 
termed BMP-2 proteins (bone morphogenetic protein), and U.S. patent applications serial nos. 179 101 and 179 197 
disclose the BMP proteins BMP-1 and BMP-3. Furthermore, many cell types are able to synthesize TGF-p-like proteins 
and virtually all cells possess TGF-p receptors. 

2$ [0003] Taken together, these proteins show differences in their structure, leading to considerable variation in their 
detailed biological function. Furthermore, they are found in a wide variety of different tissues and developmental stages. 
Consequently, they might possess differences concerning their function in detail, for instance the required cellular phys- 
iological environment their lifespan, their targets, their requirement for accessory factors, and their resistance to deg- 
radation. Thus, although numerous proteins exhibiting tissue-inductive, especially osteoinductive potential are 

so described, their natural role in the organism and, more importantly, their medical relevance must still be elucidated in 
detail. The occurrence of still-unknown members of the TGF-p family relevant for osteogenesis or differentiation/induc- 
tion of other tissues is strongly suspected. However, a major problem in the isolation of these new TGF-p-like proteins 
is that their functions cannot yet be described precisely enough for the design of a discriminative bioassay. On the other 
hand, the expected nucleotide sequence homology to known members of the family would be too low to allow for 

ss screening by classical nucleic acid hybridization techniques. Nevertheless, the further isolation and characterization of 
new TGF-p-like proteins is urgently needed in order to get hold of the whole set of induction and differentiation proteins 
meeting all desired medical requirements. These factors might find useful medical applications in defect healing and 
treatments of degenerative disorders of bone and/or other tissues like, for example, kidney and liver. 
[0004] Thus, the technical problem underlying the present invention essentially is to provide DNA sequences coding 

40 for new members of the TGF-p protein family having mitogenic and/or differentiation-inductive, e.g. osteoinductive 
potential. 

[0005] The solution to the above technical problem is achieved by providing the embodments characterized in claims 
1 to 1 8. Other features and advantages of the invention will be apparent from the description of the preferred embodi- 
ments and the drawings. The sequence listings and drawings will now briefly be described. 

45 

SEQ ID NO. 1 shows the nucleotide sequence of MP-52, i.e. the embryo derived sequence corresponding to the 
mature peptide and most of the sequence coding for the propeptide of MP-52. 

Some of the propeptide sequence at the 5-end of MP-52 has not been characterized so far. 

so SEQ ID NO. 2 shows the so far characterized nucleotide sequence of the liver-derived sequence MP- 121. 

SEQ ID NO. 3 shows the amino acid sequence of MP-52 as deduced from SEQ ID NO. 1 . 

Figure 1 shows an alignment of the amino acid sequences of MP-52 and MP-121 with some related proteins. 1a 
55 shows the alignment of MP-52 with some members of the BMP protein family starting from the first of the seven 
conserved cysteins; 1b shows the alignment of MP-121 with some members of the tnhibin protein family. * indicates 
that the amino acid is the same in all proteins compared; + indicates that the amino acid is the same in at least one 
of the proteins compared with MP-52 (Fig. 1a) or MP-121 (Fig. 1b). 
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Figure 2 shows the nucleotide sequences of the oligonucleotide primer as used in the present invention and an 
aOgnment of these sequences with known members of the TGF-p family. M means A or C; S means C or G; R 
means A or G; and K means G or T. 2a depicts the sequence of the primer. OD; 2b shows the sequence of the 
primer OID. 

5 

[0006] The present invention relates to novel TGF-fMrke proteins and provides DNA sequences contained in the cor- 
responding genes. Such sequences include nucleotide sequences comprising the sequence 

ATGAACTCCATGGACCCCGAGTCCACA and 

CTTCTCAAGGCCAACACAGCTGCAGGCACC 
10 and in particular sequences as illustrated in SEQ ID Nos. 1 and 2, allelic derivatives of said sequences and DNA 
sequence&degenerated as a result of the genetic code for said sequences. They also include DNA sequences hybrid- 
izing under stringent conditions with the DNA sequences mentioned above and containing the following amino acid 
sequences: 

Met-Asn-Ser-Met-Asp-Pro-Qu-Ser-Thr or 
15 Leu-LethLys-Ala-Asn-Thr-Ala-Ala-Gry-Thr. 

[0007] Although said allelic, degenerate and hybridizing sequences may have structural divergencies due to naturally 
occurring mutations, such as small deletions or substitutions, they will usually still exhibit essentially the same useful 
properties, allowing their use in basically the same medical applications. 

[0008] According to the present invention, the term "hybridization" means conventional hybridization conditions, pref- 

20 erabry conditions with a salt concentration of 6xSSC at 62° to 66°C followed by a one-hour wash with 0.6 xSSC, 0.1% 
SDS at 62° to 66°C. The term "hybridization" preferably refers to stringent hybridization conditions with a salt concen- 
tration of 4 x SSC at 62°-66°C followed by a one-hour wash with 0.1 x SSC, 0.1% SDS at 62°-66°C. 
[0009] Important biological activities of the encoded proteins comprise a mitogenic and osteoinductive potential and 
can be determined in assays according to Roberts et al., PNAS 78 (1981), 5339-5343, Seyedin et al., PNAS 82 (1985), 

25 2267-2271 or Sampath and Reddi, PNAS 78 (1981), 7599-7603. 

[0010] Preferred embodiments of the present invention are DNA sequences as defined above and obtainable from 
vertebrates, preferably mammals such as pig or cow and from rodents such as rat or mouse, and in particular from pri- 
mates such as humans. - 
[001 1 ] Particularly preferred embocfi merits of the present invention are the DNA sequences termed MP-52 and MP- 

30 121 which are shewn in SEQ ID Nos. 1 and 2. The corresponding transcripts of MP-52 were obtained from embryogenic 
tissue and code for a protein showing considerable amino acid homology to the mature part of the BMP-like proteins 
(see Fig. 1a). The protein sequences of BMP2 (=BMP2A) and BMP4 (=BMP2B) are described in Wozney et al., Sci- 
ence Vol 242, 1528-1534 (1988). The respective sequences of BMP5, BMP6 and BMP7 are described in Celeste et al., 
ProoNatl.Acad.Sct. USA Vol 87, 9843-9847 (1990). Some typical sequence homologies, which are specific to known 

35 BMP-sequences only, were also found in the propeptide part of MP-52, whereas other parts of the precursor part of MP- 
52 show marked differences to BMP-precursors. The mRNA of MP-121 was detected in liver tissue, and its corre- 
spondig amino acid sequence shows homology to the amino acid sequences of the Inhibin protein chains (see Fig. 1b). 
cDNA sequences encoding TGF-p-like proteins have not yet been isolated from liver tissue, probably due to a low abun- 
dance of TGF-0 specific transcripts in this tissue. In embryogenic tissue, however, sequences encoding known TGF-p- 

40 like proteins can be found in relative abundance. The inventors have recently detected the presence of a collection of 
TGF-0-like proteins in liver as well. The high background level of clones related to town factors of this group presents 
the main difficulty in establishing novel TGF-p-related sequences from these and probably other tissues. In the present 
invention, the cloning was carried out according to the method described below. Once the DNA sequence has been 
cloned, the preparation of host cells capable of producing the TGF-p-like proteins and the production of said proteins 

45 can be easily accomplished using known recombinant DNA techniques comprising constructing the expression plas- 
mids encoding said protein and transforming a host cell with said expression plasmid, cultivating the transformant in a 
suitable culture medium, and recovering the product having TGF-p-like activity. 

[0012] Thus, the invention also relates to recombinant molecules comprising DNA sequences as described above, 
optionally linked to an expression control sequence. Such vectors may be useful in the production of TGF-p-like pro- 

so teins in stably or transiently transformed cells. Several animal, plant, fungal and bacterial systems may be employed for 
the transformation and subsequent cultivation process. Preferably, expression vectors which can be used in the inven- 
tion contain sequences necessary for the replication in the host cell and are autonomously replicable. It is also prefer- 
able to use vectors containing selectable marker genes which can be easily selected for transformed cells. The 
necessary operation is well-known to those skilled in the art 

55 [001 3] It is another object of the invention to provide a host cell transformed by an expression plasmid of the invention 
and capable of producing a protein of the TGF-p family. Examples of suitable host cells include various eukaryotic and 
prokaryotic cells, such as E coli, insect cells, plant cells, mammalian cells, and fungi such as yeast 
[0014] Another object of the present invention is to provide a protein of the TGF-p family encoded by the DNA 
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sequences described above and displaying biological features such as tissue-inductive, in particular osteo-inductive 
and/or myogenic capacities possibly relevant to therapeutical treatments. The above-mentioned features of the protein 
might vary depending upon the formation of homodimers or heterodimers. Such structures may prove useful in dintcal 
applications as weD. The amino acid sequence of an especially preferred protein of the TGF-p-family (MP-52) is shewn 
5 in SEQ ID NO. 3. 

[0015] It is a further aspect of the invention to provide a process for the production of TGF-p-Oke proteins. Such a 
process comprises cultivating a host cell being transformed with a DNA sequence of the present invention in a suitable 
culture medium and purifying the TGF-p-like protein produced. Thus, this process will allow the production of a suffi- 
cient amount of the desired protein for use in medical treatments or in applications using cell culture techniques requir- 
io ing growth factors for their performanca The host cell is obtainable from bacteria such as Bacillus or Escherichia colt, 
from fungi such as yeast, from plants such as tobacco, potato, or Arabidopsis, and from animals, in particular vertebrate 
ceo Ones such as the Mo-, COS- or CHO cell lina 

[001 6] Yet another aspect of the present invention is to provide a particularly sensitive process for the isolation of DNA 
sequences corresponding to low abundance mRNAs in the tissues of interest The process of the invention comprises 

15 the combination of four different steps. First, the mRNA has to be isolated and used in an amplification reaction using 
'OGgnucleotide primers. The sequence of the oligonucleotide primers contains degenerated DNA sequences derived 
from the amino acid sequence of proteins related to the gene of interest This step may lead to the amplification of 
already known members of the gene family of interest and these undesired sequences would therefore have to be elim- 
inated. This object is achieved by using restriction endonucleases which are known to digest the already-analyzed 

20 members of the gene family. After treatment of the amplified DNA population with said restriction endonucleases, the 
remaining desired DNA sequences are isolated by gel electrophoresis and reamplif ied in a third step by an amplification 
reaction, and in a fourth step they are cloned into suitable vectors for sequencing. To increase the sensitivity and effi- 
ciency, steps two and three are repeatedly performed, at least two times in one embodiment of this process. 
[0017] In a preferred embodiment the isolation process described above is used for the isolation of DNA sequences 

25 from liver tissue. In a particularly preferred embodiment of the above-described process, one primer used for the PCR 
experiment is homologous to the polyA tail of the mRNA, whereas the second primer contains a gene-specific 
sequence. The techniques employed in carrying out the different steps of this process (such as amplification reactions 
or sequencing techniques) are known to the person skilled in the art and described, for instance, in Sambrook et al., 
1 989, "Molecular Cloning: A laboratory manual", Cold Spring Harbor Laboratory Press. 

so [001 8] ft is another object of the present invention to provide pharmaceutical compositions containing a therapeuti- 
cady-effective amount of a protein of the TGF-p family of the present invention. Optionally, such a composition com- 
prises a pharmaceutical^ acceptable carrier. Such a therapeutic composition can be used in wound healing and tissue 
repair as well as in the healing of bone, cartilage, or tooth defects, either individually or in conjunction with suitable car- 
riers, and possibly with other related proteins or growth factors. Thus, a therapeutic composition of the invention may 

35 include, but is not limited to, the MP-52 encoded protein in conjunction with the MP-121 encoded protein, and optionally 
with other known biologically-active substances such as EGF (epidermal growth factor) or PDGF (platelet derived 
growth factor). Another possible clinical application of a TGF-p-like protein is the use as a suppressor of the immuno 
response, which would prevent rejection of organ transplants. The pharmaceutical composition comprising the proteins 
. of the invention can also be used prophyJacticalty, a can be employed in cosmetic plastic surgery. Furthermore, the 

40 application of the composition is not limited to humans but can include animals, in particular domestic animals, as well. 
[001 9] Finally, another object of the present invention is an antibody or antibody fragment which is capable of specif- 
ically binding to the proteins of the present invention. Methods to raise such specific antibody are general knowledge. . 
Preferably such an antibody is a monoclonal antibody. Such antibodies or antibody fragments might be useful for diag- 
nostic methods. 

45 [0020] The following examples illustrate in detail the invention disclosed, but should not be construed as limiting the 
invention. 

Example! 

so 

[0021] 

1.1 Total RNA was isolated from human liver tissue (40-year-old-male) by the method of Chirgwin et al., Biochem- 
55 istry 18 (1979), 5294-5299. Poly A* RNA was separated from total RNA by oligo (dT) chromatography according to 

the instructions of the manufacturer (Stratagene Poly (A) Quick columns). 

1 .2 For the reverse transcription reaction, poly A + RNA (1 -2.5 ng) derived from liver tissue was heated for 5 minutes 
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to 65°C and cooled rapidly on ica The reverse transcription reagents containing 27 U RNA guard (Pharmacia), 2.5 
ug oligo d(T) 12 . l8 (Pharmacia) 5 x buffer (250 mM Trfe/HQ pH 8.5; 50 mM MgCfe; 50 mM DTT; 5 mM each dNTP; 
600 mM KCI) and 20 units avian myeloblastosis virus reverse transcriptase (AMV, Boehringer Mannheim) per fig 
poly (A + ) RNA were added. The reaction mixture (25 ul) was incubated for 2 hours at 42°C. The liver cDNA pool 
was stored at -20°C. 

1 .3 The deoxynudeotide primers OD and OID (Rg. 2) designed to prime the amplification reaction were generated 
on an automated DNA-synthesizer (Biosearch). Purification was done by denaturating polyacryiamide gel electro- 
phoresis and isolation of the main band from the gel by isotachophoresis. The oligonucleotides were designed by 
alibiing the nucleic acid sequences of some known members of the TGF-p family and selecting regions of the high- 
est conservation. An alignment of this region is shown in Rg. 2. In order to facilitate cloning, both oligonucleotides 
contained EcoR I restriction sites and OD additionally contained an Nco I restriction site at its 5* terminus. 

1 .4 In the polymerase chain reaction, a liver-derived cONA pool was used as a template in a 50 ui reaction mixture. 
The amplification was performed in 1 x PCR-buffer (16.6 mM (NH 4 fcSO^ 67 mM Tris/HQ pH 88; 2 mM MgCfe; 
6.7 uM EDTA; 10 mM p-mercaptoethand; 170 ng/ml BSA (Gtoco)), 200 |iM each dNTP (Pharmacia), 30 pmd each 
oligonucleotide (OD and OID) and 1.5 units Tag polymerase (AmpliTaq, Perkin Elmer Cetus). The PCR reaction 
contained cDNA corresponding to 30 ng of poly (A + ) RNA as starting material. The reaction mixture was overlay ed 
by paraffine and 40 cycles (cycle 1 : 80s 93°C/40s 52°C/40s 72°C; cycles 2-9: 60s 93°C/40s 52°C/40s 72°C; cycles 
10-29: 60s 93°C/40s 52°C/60s 72°C; cycles 30-31: 60s 93°C/40s 52°C/90s 72°C; cycle 40: 60s 93°C/40s 
52°C/420s 72°C) of the PCR were performed. Six PCR-reaction mixtures were pooled, purified by subsequent 
extractions with equal volumes of phenol, phenotfchloroform (1 :1 (v/v)) and chlaoforrn/isoamylalcohol (24:1 (wV)) 
and concentrated by ethand precipitation. 

1 .5 One half of the obtained PCR pool was sufficient for digestion with the restriction enzymes Sph I (Pharmacia) 
and AtwN I (Biolabs). The second half was digested in a series of reactions by the restriction enzymes Ava I (BRL), 
ArwN I (Biolabs) and Tfi I (Biolabs). The restriction endonuclease digestions were performed in 100 ul at 37°C 
(except Tfi I at 65°C) using 8 units of each enzyme in a 2- to 12-hour reaction in a buffer recommended by the man- 
ufacturer. 

1.6 Each DNA sample was fractioned by electrophoresis using a 4% agarose gel (3% FMC Nusieve agarose, 
Biozym and 1% agarose, BRL) in Tris borate buffer (89 mM Trisbase, 89 mM boric acid, 2 mM EDTA, pH 8). After 
ethidumbrorrride staining uncleaved amplification products (about 200 bp; size marker was run in parallel) were 
excised from the gel and isolated by phenol extraction: an equal volume of phenols was added to the excised aga- 
rose, which was minced to small pieces, frozen for 10 minutes, vortexed and centrrfuged. The aqueous phase was 
collected, the interphase reextracted by the same volume TE-buffer, centrifuged and both aqueous phases were 
combined. DNA was further purified twice by phenol/chloroform and once by chbroform/isoamylalcohol extraction. 

1 .7 After ethanol precipitation, one fourth or one fifth of the isolated DNA was reamplrf ied using the same conditions 
used for the primary amplification except for diminishing the number of cycles to 13 (cycle 1: 80s 93°C/40s 
52°C/40s 72°C; cycles 2-12: 60s 93°C/40s 52°C/60s 72°C; cycle 13: 60s 93°C/40s 52°C/420s 72°C). The reampli- 
f ication products were purified, restricted with the same enzymes as above and the uncleaved products were iso- 
lated from agarose gels as mentioned above fa the amplification products. The reamplif ication followed by 
restriction and gel isolation was repeated once. 

1 .8 After the last isolation from the gel, the amplification products were cBgested by 4 units EcoR I (Pharmacia) for 
2 hours at 37°C using the buffer recommended by the manufacturer. One fourth of the restriction mixture was 
ligated to the vector pBluescriptll SK+ (Stratagene) which was digested likewise by EcoR I. After ligation, 24 clones 
from each enzyme combination were further analyzed by sequence analysis. The sample restricted by AlwN I and 
Sph I contained no new sequences, only BMP6 and Inhibin pA sequences. 19 identical new sequences, which 
were named MP-1 21 , were found by the Ava I, ArwN I and Tfi I restricted samples. One sequence differed from this 
mainly-found sequence by two nucleotide exchanges. Ligation reaction and transformation in E. coli HB101 were 
performed as described in Sambrook et al., Molecular cloning: A laboratory manual (1989). Transformartts were 
selected by Ampicillin resistance and the plasrrad DNAs were isolated according to standard protocols (Sambrook 
et al. (1989)). Analysis was done by sequencing the double-stranded plasmids by "dideoxyrfconucleotide chain ter- 
mination sequencing" with the sequencing kit "Sequenase Version 2.0" (United States Biochemical Corporation). 
The clone was completed to the 3' end of the c-DNA by a method described in detail by Frohman (Amplifications, 
published by Perkin-Elmer Corporation, issue 5 (1990), pp 1 1-15). The same, liver mRNA which was used for the 
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isolation of the first fragment of MP-121 was reverse transcribed using a primer consisting of oligo dT (1 6 residues) 
[inked to an adaptor primer (AGAATTCGCATGCCATGQTCGACGAAGCfOi $). Amplification was performed using 
the adaptor primer (AGAATTCGCATGCCATCGTCGACG) and an internal primer (GGCTACGCCATGAACTTCT- 
GCATA) of the MP-121 sequence. The amplification products were reamplifled using a nested internal primer 
5 (ACATAGCAGGCATGCCTGGTATTG) of the MP-121 sequence and the adaptor primer. The reamplification prod- 
ucts were cloned after restriction with Sph I in the likewise restricted vector pT7/T3 U19 (Pharmacia) and 
sequenced with the sequencing kit "Sequenase Version 2.0" (United States Biochemical Corporation). Clones 
were characterized by their sequence overlap to the 3' end of the known MP-121 sequence. 

10 Example 2 

Isolation of MP-52 

[0022] A further cDNA sequence. MP-52. was isolated according to the above described method (Example 1) by 

is using RNA from human embryo (8-9 weeks old) tissua The PCR reaction contained cDNA corresponding to 20 ng of 
poly (A + )RNA as starting material. The reamplification step was repeated twice for both enzyme combinations. After 
ligation. 24 clones from each enzyme combination were further analyzed by sequence analysis. The sample restricted 
by AlwN I and Sph I yielded a new sequence which was named MP-52. The other clones comprised mainly BMP6 and 
one BMP7 sequence. The sample restricted by Ava I. AlwN I and Tfi I contained no new sequences, but consisted 

so mainly of BMP7 and a few Inhibin pA sequences. 

[0023] The clone was completed to the 3* end according to the above described method (Example 1). The same 
embryo mRNA, which was used for the isolation of the first fragment of MP-52. was reverse transcribed as in Example 
1 . Amplification was performed using the adaptor primer (AGAATTCGCATGCCATGGTCGACG) and an internal primer 
(CTTGAGTACGAGGCTTTCCACTG) of the MP-52 sequence. The amplification products were reamplified using a 

25 nested adaptor primer (ATTCGCATGCCATGGTCGACGAAG) and a nested internal primer (GGAGCCCACGAATCAT- 
GCAGTCA) of the MP-52 sequenca The reamplification products were cloned after restriction with Nco I in a likewise 
restricted vector (pUC 19 (Pharmacia #27-4951-01) with an altered multiple cloning site containing a unique Nco I 
restriction site) and sequenced. Clones were characterized by their sequence overlap to the 3* end of the known MP- 
52 sequenca Some of these clones contain the last 143 basepairs of the 3* end of the sequence shown in SEQ ID NO: 

30 1 and the 0.56 kb 3' non translated region (sequence not shown). One of these was used as a probe to screen a human 
genomic library (Stratagene #946203) by a common method described in detail by Ausubel et al. (Current Protocols in 
Molecular Biology, published by Greene publishing Associates and Wiley-lnterscience (1989)). From 8x1 0 5 X phages 
one phage (X 2.7.4) which was proved to contain an insert of about 20 to, was isolated and deposited by the DSM 
(#7387). This clone contains in addition to the sequence isolated from mRNA by the described amplification methods 

35 sequence information further to the 5* end. For sequence analysis a Hind 111 fragment of about 7,5 kb was subcloned in 
a likewise restricted vector (Bluescnpt SK. Stratagene #212206). This plasmid. called SKL 52 (H3) MP12. was also 
deposited by the DSM (# 7353). Sequence information derived from this clone is shown in SEQ ID NO: 1 . At nucleotide 
No. 1050, the determined cDNA and the respective genomic sequence differ by one basepair (cDNA: G; genomic DNA: 
A). We assume the genomic sequence to be correct as it was confirmed also by sequencing of the amplified genomic 

40 DNA from embryonic tissue which had been used for the mRNA preparation. The genomic DNA contains an introri of 
about 2 kb between basepairs 332 and 333 of SEQ ID NO: 1 . The sequence of the intron is not shown. The correct 
exorVexon junction was confirmed by sequencing an amplification product derived from cDNA which comprises this 
region. This sequencing information was obtained by the help of a slightly modified method described in detail by Fro- 
hman (Amplifications, published by PerWn-Elmer Corporation, issue 5 (1990). pp 11-15). The same embryo RNA which 

45 was used for the isolation of the 3* end of MP-52 was reverse transcribed using an internal primer of the MP-52 
sequence oriented in the 5* direction (ACAGCAGGTGGGTGGTGTGGACT). A polyA tail was appended to the 5' end of 
the first strand cDNA by using terminal transferase. A two step amplification was performed first by application of a 
primer consisting of oligo dT and an adaptor primer (AG AATTCGC ATGCC ATGGTCGACGAAGC(T 1 6 )) and secondly an 
adaptor primer (AGAATTCGCATGCCATGGTCGACG) and an internal primer (CCAGCAGCCCATCCTTCTCC) of the 

50 MP-52 sequence. The amplification products were reamplified using the same adaptor primer and a nested internal 
primer (TCCAGGGCACTAATGTCAAACACG) of the MP-52 sequence. Consecutively the reamplification products were 
again rearrplified using a nested adaptor primer (ATTCGCATGCCATGGTCGACGAAG) and a nested internal primer 
(ACTAATGTCAAACACGTACCTCTG) of the MP-52 sequence. The final reamplification products were blunt end cloned 
in a vector (Bluescript SK, Stratagene #212206) restricted with EcoRV. Clones were characterized by their sequence 

55 overlap to the DNA of X 2.7.4. 

[0024] Plasmid SKL 52 (H3) MP12 was deposited under number 7353 at DSM (Deutsche Sarrtrrdung von Mikroor- 
ganismeh und Zellkulturen), Mascheroder Weg 1 b, 381 24 Braunschweig, on 1 0. 12. 1 992. 
[0025] Phage X 2.7.4. was deposited under number 7387 at DSM on 13.1.1993. 
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SEQ ID NO: 1 

5 SBQnftQg T2PE: Nucleotide 

SEQUQEB IBXZB: 1207 base pairs 



TO 



STRANEHHESS: double 
TOPOLOGY: 

MBLECOLAR TYPE: ENA. 



J5 ORIGINAL SOURCE* - 

ORGANISM: human 

I>WEDIA3E EXPERIMENTAL SOURCE: Bribryo tissue 



20 



45 



PROPERTIES: Sequence coding far human TGF-B-lite protein (NP-52) 



mo3qg0qqc cctgaacoca mocmgrca 00cto0c3 ca a acaaggcagg ctacmcodg 60 

25 grctotcaoc ocmaaggac agcitooogg aggcaaggca gooocaaaag caqgaiciot 120 

coocftocroc agraggogag ggmooogqg o000gaggag a300caagga 180 

(jcx a omua: ceaoooooc a tcacacooca cgkttacatc aciarrai 1 a^ggaoqct 240 

GTOOGATOCT GACAGAAAGG GAGGGAACAG GAGOGTOGAAG CTSaGGCTG OXUO3 0 CA A 300 

30 OV0CATCADC MOTTKnG ACAAAGGQCA AGKPGAOOQt QGTO00OTGG TCAGGAAGC* 360 

GAGC7EACGIG TT1EACATXA CTGOOCTOGA GAAGQKGGG CTGCIGGGGG O0GRGCTG0G 420 

GA3CTTQ0GG AAGAAG00CT GGQCAOGGC CAAG0CAG0G QOOOOOO a G UXX^G GG L' 480 

TOOO CA QCTG AAG CT OTO CA OCTOOOOCAG CQ UU 0QG GA G COGQOCTOCT lttaaaaKSL ' 540 

55 

Gocaci o o giG ocaqqoctgg aoggaicigg ctog ga gctg TraacAicr ggaagctot 600 

OOGftAACrrr AAGAACT0GG COCAGCTOIG CClUaftGCTG GW3G0CT3GG AAOGGGGGAG 660 

GGooaroGAC craxroGa oc tgggcttoga couu a igc oggcmoioc Acsraraa: 720 

" OC lOTlUaU GI\ aiTX G30C GCBCCAA»A MQQGAOCTG TTCTTEAATO AQflSAAGGC 780 

COQCiClUUL: CAGraCSflA XZOCGTG^ TTCAG0CAGC GQ0GAAAAOG 840 

G0GGQQ00CA CIG30CACTC QXAG33CAA G0GBO0GAGC AAfflAuurm A3GCTOGCTO 900 

GAG1CGGAAG GCACTGGKFG TCAACTTCAA GGACA1GGGC TOGGACtaCT GGRTCATOQC 960 

AOO OCTlUfi G TACGMGCXT TOCACTOOGA GGUULTKJrGC CKHTOXAX TQOGCTOCCA 1020 

OCTOGAG00C ACGAA1GA1G CAGTCATOCA (MOCTGATG AACICCATOG ADOOOGAGIC 1080 

CACAOCAO0C AXTQCTCTG TQ00CAOQ0G GC1GAQIC0C ATCAGCATOC TCTTCA3TCA 1140 

50 CTCTQOCAAC MCOTGOTOT A3AAGCACTA 1GAGGACKFG OTOGTCGAGT CGMGK3QCTG 1200 

CMGISiG 1207 



55 
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SEP P> NO; 2 

SBQUQCE TXTE: nucleotide 
SBQCQCE I£NQXB: 265 base pairs 

smncerasss: Single 

TOPOLOGY: Linear 

MHKHTAR TXPE: cENA to nTO& 



15 ORIGIKRL SOURCE: 

ORGANISM: Human 

BWEDIKIE EXPERIMENTAL SOURCE $ Liver tissue 



5 



TO 



20 



SO 



40 



45 



50 



55 



PROPERTIES: Bunan Tty-B-lite protein (MF-121) 



25 CKTOCfiGOCr CCKUGMCTT CTQCATOGOG CMXQOOCAC TfiOOVIRGC 60 

QGCCACTQCTG GCXOCmCA QOOXZDS CTCAMCnC TOttOaOCAA 120 

ooasoa Qcrnoocro grggbqqcic Ma^rorom axaoarr amuTn* 180 

QZCZCIGCTC TKnKTGKA QQSOOCAA COTTSTCMG ACIGACMaC dQKRKXZT 240 

AGIRGM30C TOTOQOTQCA GITOG 265 
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TP HO: 3 

5 SSQPQKS TXPE: Amino acid 

SEQCQCE IBX3XB: 401 amino acids ' 

T0 CRXGDBIi SOURCE: - 
GRGAHmt human 
BWEEGMB EXEEEttMEtUKL SOURCE: 

> 5 PROPERTIES: Hunan TOF-G-lihe protein (MP-52) 

^ KjG^HPKKa FFOTBQ^IAR T ViVKGQfP G (XBPPKAGSV FSSFUKKBR EPGPPREPKE 60 

PWm'JVH EXMLSLVREL SD RU KKQ3C SVKIEfiGEAN T1TSFIEKGQ ECRGFWKRQ 120 

RYVEDISAIB KDGCIGBEER IUKKFSCOA KEAAFGQ3RA. AQLKLSSCPS GRQPASEUV 180 

RSVFGLEGSG WBVFDIWKIF KNEXNSAQLC IEEEHKERGR AVDLHGLSPD RAARQVHEKA 240 

25 IILVFGKEKK EXEfTHEIKA RSGQPCKTVY EYLF9QKKKK BAFIAIRQGC KPSKNEXARC 300 

SRRHLflVNEK DCWTXWIIA FLEXEAFBCE E5PXNHAVIQ TLMBMT3PES 360 

TFPTOCVPTR LSPISILETD SANNWXKQJf ETKWESOQ^ R 401 



55 Claims 

1 . DNA sequence encoding a protein of the TGF-p family selected from the following group: 

(a) a DNA sequence comprising the nucleotides ATQ AAC TCC ATG GAC CCC GAG TCC ACA with the read- 
40 ing frame for the protein starting at the first nucleotide 

(b) a DNA sequence comprising the nucleotides CTT CTC AAG GCC AAC ACA GCT GCA GGC ACC with the 
reading frame for the protein starting at the first nucleotide 

45 (C) DNA sequences which are degenerate as a result of the genetic code from the DNA sequences of (a) and 

(b) 

(d) allelic derivatives of the DNA sequences of (a) and (b) encoding proteins exhibiting essentially the same 
properties as the proteins encoded by (a) or (b). 

so 

(e) DNA sequences hybridizing to the DNA sequences in (a), (b), (c) or (d) and encoding a protein containing 
the amino acid sequence 

Met-Asn-Ser-Met-Asp-Pro-Glu-Ser-Thr 
or 

55 Leu-Leu-Lys-Ala-Asn-Thr-Ala-Ala-Gly-Thr 

(f) DNA sequences hybridizing to the DNA sequences in (a), (b), (c) under stringent conditions. 
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2. The DNA sequence according to daim 1 which is a vertebrate DNA sequence or a mammalian DNA sequence. 

3. The DNA sequence according to daim 2, wherein the mammalian sequence is a primate, human, porcine, bovine, 
or rodent DNA sequence. 

5 

4. The DNA sequence according to claim 3, wherein the rodent sequence is a rat or mouse DNA sequence. 

5. The DNA sequence according to claim 1 or 2 which is a DNA sequence comprising the nudeotides as shewn in 
SEQ ID NO. 1. 

10 

6. The DNA sequence according to claim 1 or 2 which is a DNA sequence comprising the nudeotides as shown in 
SEQ ID NO. 2. 

7. A recombinant DNA molecule comprising a DNA sequence according to any one of claims 1 to 6. 

15 

8. The recombinant DNA molecule according to claim 7 in which said DNA sequence is functionally linked to an 
expression-control sequence. 

9. A host containing a recombinant DNA molecule according to claim 7 or 8. 

20 

10. The host according to claim 9 which is a bacterium, a fungus, a plant cell or an animal cell. 

1 1 . A process for the production of a protein of the TGF-p family comprising cultivating a host according to claim 9 or 
1 0 and recovering said TGF-p protein from the culture. 

25 

12. A protein of the TGF-0 family encoded by a DNA sequence according to any one of daims 1 to 6. 

13. A protein according to claim 12 comprising the amino acid sequence of SEQ ID NO: 3. 

30 14. A pharmaceutical composition containing a protein of the TGF-p family according to claim 12 or 13 optionally in 
combination with a pharmaceutically acceptable carrier. 

15. The pharmaceutical composition according to claim 14 for the treatment of various bone, cartilage or tooth defects, 
and for use in wound and tissue repair processes. 

35 

16. An antibody or antibody fragment which is capable of specifically binding to a protein of daims 12 or 13 but does 
not bind other BMP-like or Inhibin proteins. 

1 7. Antibody or antibody fragment according to claim 1 6 which is a monodonal antibody, 

40 

18. Diagnostic agent comprising an antibody according to any one of daims 16 or 17. 
PatentansprOche 

45 1 . DNA-Sequenz. die fur ein Protein der TGF-p-Familie codiert, ausgewahlt aus der folgenden Gruppe: 

(a) einer DNA-Sequenz, umfassend die NukleotkJe ATG AAC TCC ATG GAC CCC GAG TCC ACA, wobei der 
Leserahmen fur das Protein am erst en NuMeotid beginnt, 

so (b) einer DNA-Sequenz, umfassend die Nukleotide CTT CTC AAG GCC AAC ACA GCT GCA GGC ACC, 

wobei der Leserahmen for das Protein am ersten NuWeotid beginnt 

(c) DNA-Sequenzen, die als Fblge des genetischen Codes von den DNA-Seqiienzen von (a) und (b) degene- 
riert sind. 

55 

(d) allelische Derivate der DNA-Sequenzen von (a) und (b), die fflr Proteine codieren, die im wesentlichen die 
gleichen Eigenschaften wie die durch (a) Oder (b) codierten Proteine zeigen, 
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(e) DNA-Sequenzen, die mit den DNA-Sequenzen in (a), (b), (c) Oder (d) hybridisieren und fOr ein Protein 
codieren, das cfie Amihosauresequenz 

Met-Asn-Ser-Met-Asp-Pn>G!u-Ser-Thr 
Oder 

5 Leu^-eu-Lys-Ala-Asn-Thr-Ala-Aia-Gay-Thr 
enthalt, 

(f) DNA-Sequenzen, die mit den DNA-Sequenzen in (a), (b), (c) unter stringenten Bedingungen hybndisierea 

10 2. DNA-Sequenz nach Anspruch 1 , welche eine Wirbeftier-DNA-Sequenz cder eine Sauger-DNA-Sequenz ist 

3. DNA-Sequenz nach Anspruch 2, worin die Sauger-Sequenz eine Primaten-, Human-, Schweine-. Rinder- Oder 
Nager-DNA-Sequenz ist 

is 4. DNA-Sequenz nach Anspruch 3. worin die Nagersequenz eine DNA-Sequenz von Raffe Oder Maus ist 

5. DNA-Sequenz nach Anspruch 1 Oder 2, welche eine DNA-Sequenz ist, die die Nuldeottde wie in SEQ ID NO. 1 
gezeigt umfaBt 

20 6. DNA-Sequenz nach Anspruch 1 Oder 2, welche eine DNA-Sequenz ist, die die NuWeotide wie in SEQ ID NO. 2 
gezeigt umfaBt 

7. Retombtnarttes DNA-MolekOI, umfassertd eine DNA-Sequenz nach einem der AnsprOche 1 bis 6. 

25 8. Rekombinarttes DNA-MolekOI nach Anspruch 7, in dem die DNA-Sequenz funktionell mit einer Expressions- Kbn- 
trolI-Sequenz verknOpft ist 

9. Wirt umfassertd ein rekombinarttes DNA-MolekQI nach Anspruch 7 Oder 8. 

30 10. Wirt nach Anspruch 9, welcher ein Bakterium, ein Pilz, eine Pflanzenzelle Oder eine Tlerzefle ist. 

1 1 . Verfahren zur HersteQung eines Proteins der TGF-p-Familie umfassertd das Kuttivieren eines Wirts nach einem der 
AnsprOche 9 Oder 10 und Gewinnen des TGF-p-Proteins aus der Kuttur. 

35 12. Protein der TGF-p-Familie, codiert durch eine DNA-Sequenz nach einem der AnsprOche 1 bis 6. 

13. Protein nach Anspruch 12, umfassertd die Aminosauresequenz von SEQ ID NO. 3. 

14. Pharmazeutische Zusammensetzung welche ein Protein der TGF-p-Familie nach Anspruch 12 Oder 13, gegebe- 
40 nenfalls in Kbmbination mit einem pharmazeutisch annehmbaren Trager enthalt 

15. Pharmazeutische Zusammensetzung nach Anspruch 14 zur Behandlung von verschiedenen Knochen-, Knorpel- 
oder Zahndefekten und zur Verwendung bei Wund- und Gewebeheilungsvorgangen. 

45 16. ArrtikOrper oder AntikOrperfragmerrt. welcher oder welches in der Lage ist, spezif isch an ein Protein der AnsprOche 
12 oder 13 zu binden, welcher oder welches aber andere BMP-artige oder Inhtbin-Proteine nicht birtdet. 

17. ArrtikOrper oder AntikBrperfragment nach Anspruch 16, welcher oder welches ein monoklonaler Antikdrper ist 

50 18. Diagnosemittel umfassertd einen ArrtikOrper nach einem der AnsprOche 16 oder 17. 

Revendications 

1 . Sequence d'ADN codant pour une proteine de la famille du TGF-p, choisie dans le groupe suivant 

55 

(a) une sequence d'ADN comprenant les nucleotides ATG AAC TCC ATG GAC CCC GAG TCC AGA. le cadre 
de lecture pour la proteine partant au premie' nue!6otide; 

(b) une sequence d'ADN comprenant les nueleotides CTT CTC AAG GCC AAC ACA GCT GCA GGC ACC, le 
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cadre de lecture pour la proline partant au premier nucleotide; 

(c) des sequences cfADN pr6sentant un d6g6n6rescence resultant du cede g6n6tique des sequences cfADN 
de(a)et(b); 

(d) des derives aliettques des sequences d*ADN de (a) et (b) codant pour des prot6ines presentant essentiel- 
5 lement les m&mes propri6tes que les prot&nes cod§es par (a) ou (b); 

(e) des sequences cfADN strybridant aux sequences d'ADN de (a), (b), (c) ou (d) et codant pour une proteine 
contenant la sequence cfaminoacides Met-AsrhSer-Met-Asp-Pro-Glu-Ser-Thr ou Leu-Leu-Lys-Ala-Asn-Thr- 
Ala-Ala-Gly-Thr; 

(f) des sequences cfADN sYiybridant aux sequences d'ADN de (a), (b). (c) dans des conditions rigoureuses. 

10 

2. Sequence d'ADN seton la revendication 1. qui est une sequence d'ADN de vert6br6 ou une sequence cfADN de 
mamrnifdra 

3. Sequence cfADN selon la revencfi cation 2, dans laquelle la sequence d'ADN de rnamrmfere est une sequence 
is d'ADN de primate, de Itoomme, de pore, de bevin ou de rongeur. 

4. Sequence d'ADN selon la revendication 3, dans laquelle la sequence de rongeur est une sequence cfADN de rat 
oudesouris. 

20 5. Sequence cfADN selon la revendication 1 ou 2, qui est une sequence cfADN comprenant les nucleotides pr6sent6s 
dansSEQ ID NO. 1. 

6. Sequence cfADN selon la revendication 1 ou 2, qui est une sequence d'ADN comprenant les nucleotides presentes 
dans SEQ ID NO. 2. 

25 

7. Molecule cfADN recombine comprenant une sequence d'ADN selon I'une quelconque des revindications 1 & 6. 

& Molecule d'ADN recombine selon la revendication 7, dans laquelle ladite sequence d'ADN est Gee de fagon fonc- 
tionnelle d une sequence de contrdle de I'expression. 

30 

9. Hdte contenant une molecule d'ADN recombine selon la revendication 7 ou 8. 

10. Hdte selon la revendication 9, qui est une bacteria, un champignon, une cellule vegetate ou une cellule animate. 

35 11. Prcc6d6 de production d*une proteine de la famille du TGF-p, comprenant la culture d'un hdte selon la revendica- 
tion 9 ou 10 et la recuperation de ladite proteine de type TGF-p & partir de la culture. 

1 2. Proteine de la famille du TGF-p cod6e par une sequence d'ADN selon I'une quelconque des revendications 1 & 6. 

40 13. Proteine selon la revendication 12, comprenant la sequence cfaminoacides de SEQ ID NO: 3. 

1 4. Composition pharmaceutic^ e contenant une proteine de la famille du TGF-p selon la revendication 1 2 ou 1 3, 6ven- 
tueilement en combinaison avec un support pharmaceuttquement acceptable. 

45 15. Composition pharmaceutique selon la revendication 14, pour le trartement de differ entes anomalies des os, des 
cartilages ou des dents, et & utiliser dans des processus de reparation de lesions et de tissus. 

16. Anticorps ou fragment cfanticorps capable de se tier de fa$on specif tque & une proteine des revendications 12 ou 
13, mais qui ne se lie pas k d'autres proteines de type BMP ou inhibina 

so 

17. Anticorps ou fragment cfanticorps selon la revendication 16, qui est un anticorps monoclonal. 

18. Agent de diagnostic comprenant un anticorps selon I'une quelconque des revendications 16 ou 17. 
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Figure 1a 

10 20 30 40 50 

HP 52 CSRKALHVNF KDMGWDDWII APLEYEAFHC EGLCEFPLRS HLEPTNHAVI 

BMP 2 CKRHPLYVDF SDVGWNDWIV APPGYHAFYC HGECPFPLAD HLNSTNHAIV 

BMP 4 CRRHSLYVDF SDVGWNDWIV APPGYQAFYC HGDCPFPLAD HLNSTNHAIV 

BMP 5 CKKHELYVSF RDLGWQDWII APEGYAAFYC DGECSFPLNA HMNATNHAIV 

BMP 6 CRKHELYVSF QDLGWQDWII APKGYAANYC DGECSFPLNA HMNATNHAIV 

BMP 7 CKKHELYVSF RDLGWQDWII APEGYAAYYC EGECAFPLNS YMNATNHAIV 

ft + ft ft ft ft ftft ftftft+. ftft- ft ft+ ft ft Oftft + + + ftftftft 

SO 70 80 90 100 

MP 52 QTLMNSMDPE STPPTCCVPT RLSPISILFI DSANNWYKQ YEDMWESCG . CR 

BMP 2 QTLVNSVNS- KIPKACCVPT ELSAISMLYL DENEKWLKN YQDMVVEGCG CR 

BMP 4 QTLVNSVNS- SIPKACCVPT ELSAISMLYL DEYDKWLKN YQEMWEGCG CR 

BMP 5 QTLVHLMFPD HVPKPCCAPT KLNAISVLYF DDSSNVILKK YRNMWRSCG CH 

BMP 6 QTLVHLMNPE YVPKPCCAPT KLNAISVLYF DDNSNVILKK YRNMWRACG CH 

BMP 7 QTLVKFINPE. TVPKPCCAPT QLNAISVLYF DDSSNVILKK YRNMWRACG CH 

ft»* + ++ ++ + » «« + ** * + «« « ft + « + ft * +***++** »+ 
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Figure lb 

10 20 30 40 50 

MP121 IQPEGYAMNF CIGQCPLHIA GMPGIAASFH TAVLNLLKAN TAAGTTGGGS 

InhibpA IAPSGYHANY CEGEGPSHIA GTSGSSLSFH STVINHYRMR GHSPFANLKS 

InhibPB IAPTGYYGNY CEGSCPAYLA GVPGSASSFH TAWNQYRMR GLNP-GTVNS 

Inhiba VYPPSFIFHY CHGGCGLHIP PNLSLP VPGAPPTPAQ PYSLLPGAQP 

+ * ++ + * * + ++ + *++ +++ + + + + 

60 70 80 90 

MP121 CC — VPTARR PLSLLYYDRD SNIVKTD-I? DMWEACGCS 

InhibPA CC—VPTKLR PMSMLYYDDG QNIIKKD-IQ NKIVEECGCS 

InhibPB CC — IPTKLS TMSMLYFDDE YNIVKRD-V? NMIVEECGCA 

inhiba CCAALPGTMR PLHVRTTSDG GYSFKYETVP NLLTQHCACI 

** +*+ + +++ ++.++ +++* + ++ + ++ * + * + 
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Figure ?a 



ECO R1 NCO 1 

OD ATGAATTCCCATGGACCTGGGCTGGMAKGAMTGGAT 

BMP 2 ACGTGGGGTGGAATGACTGGAT 

BMP 3 ATATTGGCTGGAGTGAATGGAT 

BMP 4 ATGTGGGCTGGAATGACTGGAT 

BMP 7 ACCTGGGCTGGCAGGACTGGAT 

TGF-B1 AGGACCTCGGCTGGAAGTGGAT 

TGF-B2 GGGATCTAGGGTGGAAATGGAT 

TGF-B3 AGGATCTGGGCTGGAAGTGGGT 

inhibin a AGCTGGGCTGGGAACGGTGGAT 

inhibin B A ACATCGGCTGGAATGACTGGAT 

inhibin B fi TCATCGGCTGGAACGACTGGAT 



Figure 2b 



OID 
BMP 2 
BMP 3 
BMP 4 
BMP 7 
TGF-B1 
TGF-B2 
TGF-B3 
inhibin a 
inhibin B A 
inhibin fi Q 



Eceft I 

ATGAATTCGAGCTGCGTSGGSRCACAGCA 
GAGTTCTGTCGGGACACAGCA 
CATCTTTTCTGGTACACAGCA 
CAGTTCAGTGGGCACACAACA 
GAGCTGCGTGGGCGCACAGCA 
CAGCGCCTGCGGCACGCAGCA 
TAAATCTTGGGACACGCAGCA 
CAGGTCCTGGGGCACGCAGCA 
CCCTGGGAGAGCAGCACAGCA 
CAGCTTGGTGGGCACACAGCA 
CAGCTTGGTGGGAATGCAGCA 
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